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SUMMARY 

In  this  paper  an  algorithm  fcr  calculating  roots  is  given  that  is 
Newton's  method  initialized  with  a piecewise  best  starting  approxima- 
tion. The  piecewise  best  starting  approximation  corresponds  to  a 
partition  of  the  interval  of  the  domain  of  Newton’s  method  and  it  is 
shown  how  to  choose  this  partition  to  be  optimal.  Explicit  formulas 
are  given  when  linear  polynomials  are  used  for  the  best  starting  ap- 
proximations. Specific  examples  are  given  for  square  roots,  cube 
roots  and  reciprocal  square  roots. 

1.  INTRODUCTION 

An  effective  algorithm  for  calculating  roots  is  Newton's  method 
initialized  with  a best  starting  approximation.  Recently  [2,3j,  this 
procedure  has  beer,  modified  in  that  a piecewise  best  starting  approxi- 
mation was  used  for  initializing  the  Newton  iteration.  This  is  equi- 
valent to  subdividing  the  interval  of  application  of  the  Newton  itera- 
tion into  subintervals  and  applying  the  theory  of  best  starting  approxi- 
mations to  each  subinterval.  In  this  paper,  we  shall  describe  how 
this  subdivision  can  be  done  in  an  optimal  manner. 
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The  theory  of  best  starting  approximations  for  calculating  roots 


was  first  studied  by  Moursund  [10]  for  the  special  case  of  square  roots. 

ItMniWNMIlU' 

This  theory  was  extended  to  general  roots  by  Moursund  and  Taylor  in 


[11].  Subsequent  studies  found  that  the  best  starting  approximation 
for  calculating  roots  via  Newton's  method  is  independent  of  the  number 
of  iterations  to  be  used  and  is,  in  fact,  a multiple  of  the  best  rela- 
tive approximation  to  the  root  [8,  12,  13,  14,  15,  17].  Surprisingly, 
it  was  also  shown  [3,  12,  14]  th3t  one  of  the  square  root  subroutines 
in  use  prior  to  the  development  of  this  theory  [5],  was,  in  fact,  the 
method  of  Moursund. 

This  theory  allows  considerable  leeway  in  desigining  a specialized 
root  routine.  In  the  case  of  large  scale  computers,  it  is  possible  to 
design  routines  that  return  a predetermined  accuracy  and  in  most  cases 
have  a decreased  tine  lag  when  compared  to  calling  the  system's  subroutine. 
Whereas,  for  microprocessors  these  algorithms  can  be  incorporated  as 
firmware  support  whenever  the  ability  to  calculate  roots  is  desired. 

For  example,  in  [2,3]  square  root  routines  of  this  type  were  developed  for 
8-bit  and  16-bit  microprocessors,  where  it  was  desired  for  the  16-bit 
routine  to  develop  an  algorithm  which  would  give  15  bits  of  accuracy  after 
one  Newton  iteration  initialized  with  a linear  polynomial  and  have  as  its 

i 2*6 

domain  of  application  all  numbers  of  the  form  X * ( lpJj-2^+1'  or^er  t0 

develop  such  an  algorithm  it  was  necessary  to  partition  the  point  set  into 
X - XXU  X2  U X3,  Xx  * X n (1/4,  Cj],  X2  - X n (cr  c0]  and  Xj  a X D (c0,  1] 
for  appropriately  chosen  c^  and  cj.  Treating  each  of  these  three  sets 
independently  it  is  possible  to  develop  a square  root  algorithm  satisfying 
the  constraints  listed  above  (with  the  exception  of  the  domain  constraint). 
Thus,  the  algorithm  used  a piecewise  best  linear  polynomial  starting  approxi- 
mation — defined  to  be  optimal  on  X^,  X,  and  X separately.  The  actual  use 
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of  this  algorithm  would  first  involve  a scaling  of  the  given  positive 
number  by  an  even  power  of  2 to  produce  a number  in  X.  Next,  it  would 
determine  which  subset  (X, , X^  or  X^)  contains  the  scaled  number  and 
then  evaluate  the  best  linear  starting  approximation  at  the  scaled 
number.  This  value  is  then  used  to  initialize  one  Newton  iteration 
for  calculating  square  roots.  The  result  of  this  iteration  is  then 
multiplied  (shifted)  by  2 to  half  the  power  of  the  original  scaling  and 
then  this  value  is  returned  as  the  desired  square  root.  This  algorithm 
was  compared  with  the  corresponding  direct  and  Cordic  type  of  methods 
[16]  including  Chen's  modified  version  [4]  and  was  found  to  be  preferable 
for  the  types  of  architectures  considered.  A second  example  was  in  the 
development  of  a reciprocal  square  root  routine  using  a divide-free 
Newton  iteration  for  inclusion  in  the  particle  moving  section  of  a 
relativistic  plasma  code  on  an  IBM  360/91.  The  design  constraints  in 
this  case  were  a required  accuracy  of  10  ^ after  one  Newton  iteration 
initialized  with  a linear  polynomial  on  [1/8,  1/2].  Here  the  interval 
[1/8, 1/2]  was  divided  into  five  subintervals  in  order  to  satisfy  these 
constraints. 

The  main  result  of  this  paper  is  the  following:  Suppose  that  one  wishes 
to  subdivide  an  interval  [a,  b]  into  v subintervals  and  calculate  a root 
via  Newton's  method  on  [a,  b]  by  calculating  independently  on  each  sub- 
interval. Then  there  exists  a unique  partition  that  is  optimal  with 
respect  to  the  relative  error.  Even  though  this  result  is  for  the  continuous 
case  it  is  still  useful  for  actual  applications  since  it  suggests  where  to 
subidivide  in  a discrete  setting.  The  precise  organization  of  this  paper 
is  as  follows.  Section  2 contains  a summary  of  the  definitions  and  basic 
theoretical  results  of  best  starting  approximations,  section  3 gives  our 
general  results  and  section  4 gives  specific  examples  of  this  theory  for 
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calculating  square  roots,  reciprocal  square  roots  using  a divide  free 
iteration  and  cube  roots,  each  subject  to  certain  design  constraints. 


2.  DEFINITIONS  AND  BASIC  NOTIONS 

Let  [a,b]  be  a fixed  interval  with  0 < a < b and  set 
^[a,b]  = {R=P/Q  : P Q £ Q(x)  > 0 for  all  x £ [a,b],  (P,Q)-1) 

where  denotes  the  class  of  all  real  algebraic  polynomials  of  degree 
less  than  or  equal  to  k and  (P,Q)  denotes  the  greatest  common  (poly- 
nomial) divisor  of  P and  Q.  Fix  a a real  number,  a / 0 or  + 1,  and 
define  N^:  C+[a,b]  -*•  C[a,b],  where  C^Ca.b]  denotes  the  class  of  all 
continuous  positive  functions  defined  on  [a,b],  by 

N (h)  (x)  * af(S  - l)h(x)  + — r-~ ],  3 = 1/a. 

a L he_1(x) J 

Observe  that  Na(h)(x),  for  fixed  x is  simply  the  result  of  one  Newton 
iteration  for  calculating  x^  with  h(x)  as  its  starting  approximation 
(or  initial  guess).  That  is,  the  formula  for  N is  simply  the  result 

q 

of  applying  Newton's  method  to  y - x = 0,  x fixed.  As  usual,  we  also 

define  by  N^'(h)(x)  = N (N^  ^(h))(x),  the  result  of  k Newton  iterations, 
a a a a 

Then  R £ /£^[a,  b]  is  said  to  be  the  best  (relative)  starting  approximation 
fromf2™[a,  b]  for  calculating  a-roots  on  [a,  b]  provided 


(1) 


nQ[a,b]> 


- N t (R  )(x) 


min 


x°  - N~ (R) (x) 


[a.b]  Re^ta.bJ  11 


(a,b] 


where  ]|f(x)||  ^ ^ * tr.ax{|f(x)|  : x € [a,b]}  for  f £ C [a,  b].  We  shall 
suppress  the  subscript  [a,  b]  on  ||  • j]  whenever  the  meaning  is  clear.  Thus 
the  relative  error  of  approximating  xa  with  one  Newton  iteration  for  calculatir 

xa  with  initial  guess  R(x)  is  minimized  on  the  interval  [a,b]  if  R*(x)  is  used 

* 

as  the  initial  guess.  It  is  shown  in  [6,9]  that  R exists,  is  unique  and  is 


a multiple  (depending  upon  a,  [ a , b],  m and  n)  of  the  best  relative 


approximation,  R(x),  to  x3  from  ™[a,  b]*  i.e- 


(2) 


R(x) 


x'1 


mtn 

R«*m[a.b] 

n 


x - R (x) 


x3 


. From  the 


general  theory  of  uniform  relative  approximation  [1],  it  is  known  that 
R(x)  exists,  is  unique  and  can  be  calculated  by  various  methods  (see  for 

a 


example,  [7]).  In  fact  [6,9],  if 


Rfx) 


X then 
a 


R (x)  = y^R(x)  where 

(3)  Va  ” [(a  + Xa)6_I  " O - " l)Aa(l  - X2)S_1]  , S = l/a 

it  ra 

and  this  same  R (x)  is  also  the  best  starting  approximation  f rom  ,b] 


for  k Newton  iterates,  i.e. 


naCa,b]  - 


x3  - N>  (R* ) (:■:)  j[ 


x» 


min 

R€Xra[a,b] 

n 


Xa  - n!(R)(x)m 


x) 

I 


In  closing  this  section,  we  would  like  to  remark  that  a theory 
Of  best  (absolute)  starting  approximations  for  calculating  roots,  i.e. 


Inf 

RCXfYa.b] 


|x3  - N*(R)(x)||, 


k a positive  integer  is  neither  as  well  developed  nor  as  rich  as  the 
corresponding  relative  theory.  It  is  known  [9]  that  best  absolute 
starting  approximations  exist,  are  unique  and  can  (in  theory)  be  cal- 
culated by  a Remes  type  algorithm  or  a generalized  differential  cor- 
rection algorithm  [7].  Whether  or  not  best  absolute  starting  approxi- 
mations are  a multiple  of  some  other  well  known  approximation  to  x1 
is  not  known  (they  are  not  a multiple  of  the  best  uniform  approxima- 


tion to  x3)  and  optimal  partitioning  results  corresponding  to  what  we 


shall  prove  for  the  relative  case  are  not  known.  Thus,  unless  expli- 
citly stated  to  the  contrary,  we  shall  be  concerned  with  the  relative 
theory  In  what  follows. 
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3.  MAIN  RESULTS 


In  this  setting,  we  wish  to  first  prove 

' 'k  jjj 

Theorem  1 . If  R £ )Cn[a,b]  is  the  best  starting  approximation  from 
£™[a,b]  for  calculating  a-roots  on  [a,b]  then  R(t)  = p°R  (t/p), 

TH 

pa  < t < pb,  p > 0 is  the  best  starting  approximation  from  % [pa,  pb] 

n 

for  calculating  a-roots  on  [pa,  ob].  Furthermore,  i [a,b]  = n. [pa,  pb] 


Proof:  We  are  given  that  n. [a,b] 

CL 


x - N-.  (R  ) (x)  i 


Define  the  change  of  variables  x = t/p.  Then,  for  pa  <_  t <_  pb,  by 
direct  substitution  we  have  that 


1 t°  - N-,  (R)  (t ) f 

x°  - N„(R*)(x) 

1 t«  1 

1 

[p.a,pb] 

X* 

a,b] 


min 


x“  - Nq(R)(x) 


RG^?JCa,b3  if  x 


[a.b] 


min 

pm 


t°  - N_j(paR)  (t/p) 


p RCXg[pa,pb]jj 


I [pa,pb  ] 


“ nQ[pa,pb]. 

Thus,  by  definition  R(t)  = p R*(t/p)  is  the  best  starting  approximation 
from£^[pa,  pb]  for  calculating  a-roots  on  [pa,  pb]  and,  by  the  first 
comment  of  the  proof,  na[a,  b]  =na[pa,  ob],  I 

Using  this  result,  we  are  able  to  prove  our  optimal  partitioning 
result.  The  flavor  of  this  result  is  the  following.  Suppose  that  one 
wishes  to  subdivide  the  interval  [a,  b]  a < b,  into  v subintervals 

and  calculate  a-roots  on  [a,  b]  by  actually  calculating  a-roots  indepen- 
dently on  each  subinterval.  Then,  it  turns  out,  that  there  exists  a unique 
partitioning  of  [a,  b]  into  v subintervals  such  the  relative  error  of 
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approximating  with  a Newton  iterate  initialized  on  each  subinterval 
with  the  best  starting  approximations  for  that  subinterval  is  minimal 
over  all  such  partitions  of  [a,  b]  and,  in  fact,  the  relative  error  on 
each  subinterval  is  the  same. 


Theorem  2.  Let  ~p  =’  (d  : d = {d„,  d, , ....  d } with  a = d„  < d,  < d„  < . . . 
--01  v 012 

< d^_^  < d^  = bl  be  the  set  of  all  partitions  of  [a,  b]  into  v subintervals. 
Then,  there  exists  one  and  only  one  partition,  c = (cq,  c^,  ...»  c^}  £ P 
for  which 


max  n [c., 
0<i<v-l  a 1 


min  max  n [d.,  d ]. 
&P  0<_i<_v-l  a 1 i+L 


This  unique  partition  is  given  by  the 


formulas 


afa-j)/vb  j /v. 


j 0 * 1 » • 


1. 

In  addition,  this  theorem  holds  with  n replaced  bv  n for  k a positive 

a • 1 a 

k k 

and  n [c. , c.  , ] = n Cc.  , , c.  ] for  all  k,  a and  i. 
a i i+l  a l+l  i+2 


. ,v. 

integ 


Proof : First,  observe  that  for  the  partition,  q • -cj 

lV  Vl1  * [ puc0’  Vl1  where  PU  = (l)y/v*  for  W * •• 

Thus,  by  Theorem  1,  c ^ = naCc0,  c1]  for  u = 1,  .. 


we  have  that 
. , v - 1. 

. , v - 1. 


To  prove  the  ninnax  statement  of  Theorem  2 (which  will  also  establish 
the  uniqueness  claim)  we  prove  the  following  result  first.  Namely,  if 

g C 

[a,  b],  0 < a < b and  [c,  d],  0 < c < d are  any  two  intervals  and  ^ 
then  nQ[a,  b]  > n^Jc,  d].  Now  by  Theorem  1,  we  can  replace  [c,  d]  by 
[pc,  pd]  where  0 = ~ and  n.Jc,  dj  - na(pc,  Pdl*  Tf"-U3  setting  e=od<b,  we 
shall  prove  that  n [a,  b]  > n [a,  e]  which  will  establish  this  result. 

To  do  this,  let  R(x)  be  the  best  relative  approximation  to  x on  [a,  b] 

from  ■C'Hfa,  bj.  Define  the  defect,  d,  of  R(x)  bv  d = min(m  - 3P,  n - 3Q) 

n 
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where  R(x)  = P(x)/Q(x)  and  3P  denotes  the  exact  degree  of  the  polynomial 

P.  Then,  by  the  standard  theory  of  best  relative  approximation  [1], 

there  exists  at  least  N = n + m+  2-  d extreme  points,  a <_  x^  < x-,  < ... 

P(x) 

< Xjj  _<  b on  which  the  error  curve  E(x)  = 1 - : — — alternates,  i.e., 

x1 

|E(xi)|  = ]|e||,  i = 1,  ....  M and  E(x,)  = -E(xi+1),  i = 1,  2,  ...,  N - 1. 
Since  E G C1[a,  b),  we  must  have  that  E* (x^)  * 0 for  at  least  i*2,...,N-l. 
Thus,  E' (x)  must  have  at  least  n + m - d zeros.  Now, 


fc’(x) 


xa~1[xQ(’x)P'  (x)  - PfxH'Hx)  xO ' f x ) ) 1 

(xaQ(x))2 


and  since  a > 0,  and  the  degree  of  the  polynomial  in  the  brackets  in  the 

numerator  equals  3?  + 3Q  which  is  less  than  or  equal  to  n + m - 2d, 

we  see  that  E'(x)  can  have  at  most  n + m - 2d  zeros  in  [a,  b] . Comparing 

these  two  zero  counts,  ve  see  that  we  must  have  d = 0,  3P=m,  3Q=n, 

Xj  ■ a,  x^  = b and  that  E(x)  must  have  precisely  N = n + m + 2 extreme 

points  in  [a,  b].  Here,  y € [a,  b]  is  said  to  be  an  extreme  point  if 

|E(y)|  * ||E||  . Thus,  R is  not  the  best  relative  approximation  to 

on  [a,  e]  from  £^[a,  e]  since  E(x)  does  not  have  the  necessary  alternatin 

behavior  on  [a,  e].  Let  R,  € ;Tn[a,  e]  be  the  unique  best  relative 

i m 

approximation  to  x on  [a,  e].  We  claim  that  R^  is  not  a multiple  of 

ft.  This  follows  from  the  above  zero  counting  argument,  since  for  any 

cRfx) 

real  c i1  0,  E (x)  = 1 must  be  such  that  E'(x)  vanishes  in  [a,  b] 

C 01  c 

X 

only  where  E'(x)  vanishes  in  [a,  b],  imply Lng  that  cR  does  not  have  the  necessary 

number  of  alternations  to  be  the  best  relative  approximation  to  xa  in  [a,  e] 

Since  the  best  starting  approximations  for  calculating  a-roots  from 

f”[a,  e]  and  ,L™[a,  M on  [a,  e]  and  [a,  bl  are  multiples  of  R,  and  R, 

respectively  and  R C Cn[a,  e]  we  must  have  that 

m 
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n [a,  e]  - min 


RcXSU.e] 


xu  - N (R)  (x) 

01 


< nafa,  b] 


ta.e] 


by  uniqueness. 

Now  let  d = {d.}^  a = dn  < d.  < ...  < d = b,  be  a different 
3 j=0  01  v 

partition  of  [a,  b]  than  the  one  given  by  the  formulas  in  the  hypothesis 

of  Theorem  2.  Then  we  claim  that  for  some  j,  0<j  < v - 1,  -p-d.,,  > c. 

- - d,  j+1  1 

must  hold.  Indeed,  if  — dj+^  _i  for  all  j with  strict  inequality  holding 
at  least  once,  say  at  j^,  0 <_  j^  £ v - 1 where  j is  the  first  index  where 
strict  inequality  holds  (such  an  index  must  exist  since  the  partitions 
are  distinct),  then  we  must  have  that  d^  <_  c^  for  0 _<  j £ j^  and  d..  < Cj 
for  j^  < j _<  v.  This  will  give  a contradiction  since  we  must  have 
dy  * cy  * b.  Now,  this  assertion  is  proved  by  an  inductive  argument  as 
follows.  Since  dn  = Cq  = a,  we  assume  that  d.  _<  c^  hold  for  some  j. 


J - j 


0 <.  j _£  j,  — 1 then  d.  . <_  — d.  by  our  original  assumption  so  that 

. -1/v  1 h.  , -1/v  1/v  , _ . . 

dj+1  1 c0  S dj  - c0  cv  Cj  = c Thus,  for  0 <_  j <_  we  have 

a 

d,  < c..  For  j = j,,  the  assumption  ~r~ d...  < c.  implies  d.  < — d, 

J — J 1 dj  j+1  - a jj+1  a 

and  then,  by  induction  again,  we  find  that  d < c. 


< c:1/vc1/vc. 
— 0 v j 


1 


Jl+1 


j 


for  j.  < j < v as  claimed.  Thus,  -r-d  < c.  cannot  hold  for  all  j. 
l dj  j+1  1 

Let  j,  0 _<  J _<  v - 1 be  an  index  for  which  > holds.  Then 

the  interval  [d.,  d ] is  such  that  [-p-d,,  -p-d.,,]  = [a,  e.]  with  e.  > c,  . 

J j+i  dj  j d j j+1  j j 1 

Hence,  by  our  preceding  work  we  have  that  n^fdj,  d^+^]  > nafa,  c, ] as  desired. 

From  this  it  follows  that  max  n [c.,  c.  .]  < max  n (d.,  d.^,] 

0<i<v-l  ^ * l+t  o<i<v-l  ^ i i 1 

for  each  d £ '/'with  d i c.  To  see  that  this  is  also  true  for  r,  replaced 

* % a 


with  n^, k a positive  integer,  one  need  only  observe  that  Is  a strictly 
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pointvise  monotone  one-sided  operator  [9].  What  this  implies  is  that 

N (R)(x)  > x3  for  x € [a,  b]  and  R(x)  / xa  and  if,  N (R. ) (y)  > N (R,) (y) 

3 ci  i a z 

v v 

for  some  y £ [a,  b]  then  N^(R^)(y)  > N^CR-My)  for  k a positive  integer. 

From  this  observation  the  final  result  readily  follows.  B 

Before  applying  this  theory  to  some  specific  examples,  we  wish  to  study 
the  problem  of  finding  best  starting  approximations  from^[a,  b]  l 
for  calculating  nt‘1  roots  on  [a,  b].  In  this  very  simple  case,  it  is 
possible  to  give  analytical  formulas  for  the  best  relative  approximation 
to  x1/n  from  on  [a,  b]  and,  therefore,  also  for  the  best  starting 

— f.  U. 

approximation  from  II,  for  calculating  n roots  on  [a,  b] . See  reference 
[13]  where  this  result  has  also  appeared. 

Theorem  3.  Fix  the  interval  [a,  b],  0 < a < b.  Then  the  best  (linear) 
relative  approximation  to  x^n  on  [a,  b]  from  I!.,  p(x)  = ax  + 6,  is 

given  by 


(4) 


/,  1/n  l/nwi 
(b  -a  ) (1 


X) 


b - a 


(5) 


(ba1/n  - ab1/n)(l  - >) 
b - a 


where 


(6) 


l/n  . 

x - p(x) 


1/ n 


min 

pGnx 


l/n  , . 

x - p(x) 

l/n 


w - 1 
w + 1’ 


and 


(7) 


“ _JL_  ( 
n - 1 \ 


ba 


1/n  - ab1/n  Ufa  - D(b1/n  - a1/n>  ' 


l/n 


b - a 


M 


ba1/n  -ab1/n 


Proof : By  referring  to  the  proof  of  Theorem  2,  we  know  that  the  error 


curve  E (x) 
n 


1 


P(x) 

xi/n 


1 


ax  4-  6 
. 1/n 


must  have  precisely  three  extreme 
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points,  a,  £»  b,  a < 5 K b and  that  p will  satisfy  the  following  system 
(the  unknowns  are  a,  S,  £ and  X) 


1 - a ^n(cta  + 2)  “ X 
1 - c 1/n(aC  + B)  - -X 
1 - b_1/n(ab  + 2)  = X 


(n  - l)af1/n  - 6r(n+1)/n  = 0 


where  the  fourth  equation  is  the  derivative  of  the  error  curve  at  £ set 
equal  to  0.  Also,  since  the  best  linear  relative  approximation  is  unique, 
we  have  that  there  exists  one  and  only  one  solution  to  this  system. 

Solving  simultaneously  for  a and  2 in  terms  of  1 - X in  equations  1 and  3 
gives  (4)  and  (5)  of  the  theorem.  Substituting  this  in  equation  4 gives 
an  expression  for  £ and  then  substituting  all  these  values  in  equation  2 
gives  the  formula  for  X.  £ 


Corollary  1.  The  best  starting  approximation  from  lu  for  calculating 

roots  is  p*(x)  = y,  . p(x)  where  p is  defined  in  Theorem  3 and  y . 

1/n  1/n 

is  given  by  (3)  with  “ a 8 


4.  EXAMPLES 

In  this  section  we  give  specific  examples  of  the  above  theory 
for  computing  square  roots,  reciprocal  square  roots  using  a divide-free 
Newton  iteration  and,  finally,  cube  roots.  In  the  first  two  examples 
we  shall  only  use  best  starting  approximations  from  and  consider  what 
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happens  when  at  roost  two  Newton  iterations  are  required.  For  the  cube  root 
case  we  shall  also  consider  other  classes  of  rational  functions  for  the 
initialization  of  the  Newton  iteration. 


A.  Square  Roots 

To  develop  an  algorithm  for  computing  square  roots  based  upon  the 
preceding  theory,  we  roust  first  select  an  interval  of  application.  Any 
interval  of  the  form  [a,  4a],  a > 0 will  do;  however,  the  reasonable 
choices  are  intervals  such  as  [x,  -g-] , [t,  1]  or  f-g,  2).  We  shall  use 

O L + L 

[~,  2]  as  our  interval  of  application  here.  Thus,  an  algorithm  for  calcu- 
lating square  roots  based  on  [-^,  2]  will  have  the  following  components. 
First  of  all,  the  algorithm  will  have  a scaling  feature.  Thus,  to  find 
>/y,  y > 0 the  algorithm  will  first  scale  y;  that  is,  it  will  calculate  m, 

an  integer  for  which  y = 2 x and  x £ 2],  Next,  it  will  compute  one 

lx  f — 

or  more  Newton  iterates,  ~ ),  to  calculate  *x 

using  the  above  theory  with  a best  linear  polynomial  starting  approximation 


1 3J 

on  2].  It  will  then  multiply  (shift)  this  final  value  by  2 
and  return  this  for  the  value  *'y,  For  a = ^ the  formulas  of  Theorem 
3 and  (3)  for  the  interval  [a,  b]  reduce  to 


(8) 


1/4  _ 1/4 V 


. - ax/  ] 

*1/2  " + a1/4J 


(9) 


1 - X 


b1/2  + aI/2 


(10) 


O 1/2.  i/2 

6 " a b a 


y1/2 


(1  - X 


-1/2 


(ID 
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Thus,  the  best  r>  lative  appr 
+ .4552813742  with  deviation 

T 

approximation  from  3.  on  [— , 

X 4. 

= = . 4849603523(  x + 


xirr.at icn  to  x'x  on  [^-,  2]  is  ?( x )= . 4e52S137u2x 

- -.02343725152  so  that  the  best  starting 

2]  for  calculating  square  roots  is  p*(x) 

1)  with  (relative)  error  n,  .»[—,?]= .000236107 

x/ 2 2 


which  can  be  calculated  by  evaluating 


1 


2<P*)(x>| 

H.  I 


for  x either  of  the 


endpoints  of  the  interval  [a,  b]  (here  x = | or  2).  Thus,  after  one  Newton 
iteration,  we  are  guaranteed  an  approximation  for  /x”  wit)  (relative)  error 
<_  .396107  x 10  4 on  [j,  2].  In  addition,  n^0[sr.  2]  = 7.841  x 10~8,  so 
that  if  one  were  to  calculate  square  roots  by  this  method  using  two  Newton 

iterates  the  (relative)  accuracy  of  the  approximation  cn  [x-,  2]  to  /x  would 

-8 

be  <_  7.341  x 10  . (All  calculations  were  done  on  a Texas  Instruments  2?-5 

i 

Next,  consider  the  case  where  the  interval  [— , 2]  is  to  be  subdivided 

L 

1 

into  two  subintervals.  By  Theorem  2,  the  optimal  subdivision  is  [-r,  2] 

= [*-,  1]  U [1,  2].  Thus,  to  calculate  /y,  y > 0 in  this  case,  tne  algsritb 
would  again  scale  y,  y = 2^nx,  x £ (-j,  2].  However,  it  must  next  determin- 
which  subinterval  contains  x by  comparing  x with  1.  After  doing  this  *he 
algorithm  proceeds  as  above  with  the  best  starting  linear  polynomials 
given  below,  using  the  polynomial  that  corresponds  to  the  interval  contains 
x. 


For  [i,  1]  the  best  starting  approximation  from  H.  for  calculating  square 
roots  on  [j,  1]  is  p*(x)  = "^^(x)  = .59017S5321x  + .4173192421  with 

nl/2fT’  1]  = 2’739912  x 10~5  and  nV2C|’  1]  = 3,9  * 10'10* 

For  the  second  subinterval.  Cl,  2],  the  best  starting  approximation  frcm 

nx  for  calculating  square  roots  on  [1,  2]  is  p*(:<)  = .4l73132421x+. 5301"  :5 2 
so  that  n1/2Cl,  2]  = 2.789912  x IO-5  = n1/2Cy,  1]  and  n?/;Cl,  2]  = n?/2Cy. 

* 3.8  x 10  ^ (as  Theorem  2 predicts). 
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Finally,  for  the  square  root  case  let  us  consider  this  algorithm  when  we 
subdivide  into  three  subintervals.  In  this  case,  we  shall  scale  to  the 
interval  1]  so  that  a comment  with  respect  to  [3]  can  be  made  (the  choice 
of  C~»  1]  rather  than  [y,  2 ] in  that  paper  was  simply  a preference  for  how 

we  viewed  the  fixed  point  numbers  in  our  microprocessor).  In  this  case  the 

, . . . .....  • rl  r1  r/l  \ 2/3  .1.1/3,  , , 1.1/3 

theoretically  optimal  suodivisicn  is  [— ,1 J = L— , (— ) ] [(— ) , (— ) J l(~/  > 

Repeating  the  above  calculations  for  these  three  intervals,  gives  p*(x) 

= .8879377727x  m .279682S727  for  [J-,  (J-)2/3];  p*(x)  = .7047566772x 

1 2/3  1 1/3 

♦ .3523783386  for  [(-)  , (-)  ];  and  p*(x)  = .5593657454x  + .4439638663 

for  [(m)^3,i]  with  n,  = 5.54139x10  ® fcr  all  three  subintervals.  In  all 

cases,  n^2  is  cf  the  order  of  1 x 10  in  this  example.  Since  2 is  of 

the  order  1.53  x 10  ^ , it  appears  to  be  necessary  to  partition  [-7,  1]  into 

three  subintervals  to  obtain  an  accuracy  cf  at  least  2 Since  the  domain 

i 216 

of  application  in  [2]  was  X = {-£g}._ 214+1 > the  above  theory  implies  that  X 

2*  J * . . 
should  be  partitioned  into  X.^kj  X„  l j X3  where  X^  = ’^4^ , X0={-fg-}^’I 

i 216  2 0 ' 2' 

and  X3  = ^ ^285'  For  this  Fartition  the  errors  n^CX^,  r<1/2[X2] 

and  wHl  all  he  less  than  n,  /0  of  abcve  and  in  this  case  not  equal. 


'1/2 


However,  in  [3]  the  absolute  error  was  the  measure  of  accuracy  so  that 
this  subdivision  was  modified.  Specifically,  fcr  the  subdivision 
x = xxu  x2  u X3  given  above  we  have  that  (n  = n^2) 


1^*  - N1/2(pft(*»  Xi>^x)l  1 oei»/x 

for  x£  X^,  i = 1,  2 or  3 where  = (j^-)1^3,  s2  = and  E3  = 1 and 

p*(  * * X^)  denotes  the  best  linear  (relative)  starting  approximation  for 

calculating  square  roots  on  X,,  i = 1,  2 or  3.  Thus,  our  final  choice  for 

our  partition  was  X = Y;  U Y.,  U Y3  where  Y^  = (£-,  ||]  H x,  Y2  = (i|,  p-)H  X 
21 

and  Y = (— , 1]0  X.  In  addition,  the  polynomials  p*(x,  X.)  = a.x  + 6. 

O J4  111 
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were  also  modified  so  that  the  product  a^x  could  he  computed  through  the 
use  of  two  shifts  and  one  add  and  the  required  accuracy  after  one  Newton 
iteration  was  still  2 ^ (absolute  error).  As  remarked  earlier,  it  is 

not  known  how  to  optimally  subdivide  an  interval  for  the  corresponding 
absolute  error  problem.  To  some  degree,  this  would  involve  estimating 
the  error  of  this  procedure  as  a function  of  the  interval  and  this  is  a 
difficult  problem.  In  fact,  even  in  the  classical  theory  of  best  uniform 
approximation  not  much  is  known  in  this  regard. 


B.  Reciprocal  Square  Roots  - Sivide-Free  Iteration 

This  amounts  to  applying  the  above  theory  with  a = - j (n  ? -2), 

In  this  case  N^yj^Hx)  * — [3  - x*h“(x)].  In  developing  an  algorithm 

using  this  iteration  we  must  again  scale  numbers  as  in  the  square  root 
case.  Thus,  we  shall  assume  that  the  scaling  will  be  done  with  respect 
to  the  interval  [^-,  2].  In  what  follows,  we  shall  give  the  best  linear 
starting  approximations  for  this  algorithm  when  one  uses  the  interval 
[y,  2],  subdivides  it  into  two  subintervals  and  subdivides  it  into  six 
subintervals . 

Now  for  this  particular  iteration  on  an  interval  [a,  b],  the  formulas 
of  theorem  3 and  (3)  reduce  to 


T 
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. 1/2  1/2  ,3/2  -3/2  1/2.  1/2  ..  1/2  , 1/2, 

, B 2(b  + a b + a) -3ab(b  •*-  a ) 

-1/2  . 1/2.  1/2  ,3/2  ,3/2  1/2.  1/2,.  1/2  , 1/2, 

2fb  + a d +a)  +3  a b (b  + a ) 


a..  (1-D 

1/ 2.1/ 2 1/ 2 7 1/2, 

a b (b  + a ) 

- 1/2,  1/2  , 

6 • -(b  + a b + a)a 


3 - X 


Thus,  the  best  starting  approximation  from  7^  for  calculating  reciprocal 
square  roots  on  2]  via  this  iteration  is  p*(x)  **  -.4314166S17:< 

+ 1.509958386,  with  n_1/2[j,  2]  =*  1.048824252  x 10~2  and  n*1/2Ij.  2] 

- 1.64427987  * io“4. 

For  subdividing  [-|,  2]  into  two  subintervals,  we  have  by  the  previous 
theory,  [^,  2]  = [|,  1]  U [1  , 2]  is  the  optimal  partition.  For  the 
subinterval  1],  the  best  starting  approximation  from  7^  for  calculating 
reciprocal  square  roots  on  [g-,  1]  via  this  iteration  is  p*(x)  = ~.810053751Sx 
+ 1.787875129  with  n_1/2tj,  1]  - 7.37705125  x 1Q~4  and  n21/2[J,  H « 8.16115x1*:' 
Likewise,  for  the  interval  [1,  2],  p*(x)  * - . 2863972505x  + 1.264218627  with 

n-l/2[1*  21  * n-l/2(I’  11  and  n-l/2[U  21  " n-i/211*  21* 

Finally,  let  us  consider  the  form  of  this  sort  of  an  algorithm  where 

it  is  desired  to  subdivide  [^,  2]  into  six  subintervals.  In  this  case 
the  optimal  partition  is  t|.  2]- [^,  <|)2/3lUl  (£)  1/3]l[  ,21/3 ] 

U [21/3,  22/,^l  U [2^3,  2].  We  shall  only  consider  the  interval  [1,  2^3] 
since  the  other  intervals  can  be  treated  in  a like  manner.  Thus,  for 
[1,  21/3]  by  direct  calculations,  p*(x)  - -.4186992113x  + 1.416201135 
with  n.1/2[l,  21/3]  - 9.352785  x io'6  and  n21/2U,  21/3]  - 1.35  x io"10. 

Note  that,  in  addition  to  the  cost  of  the  scaling  and  determining  which 


subinterval  contains  the  scaled  number,  this  algorithm  for  one  Newton 
iteration,  consists  of  three  Multiplies,  one  add  and  one  shift. 
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C.  Cube  Roots 

For  this  particular  example  we  will  also  consider  seme  nonlinear  best  start  ing 

approximation.  As  before,  in  designing  a cube  root  routine  of  this  sort 

one  must  first  select  an  interval  of  application.  The  interval  we  shall 

use  is  [— , 1].  Thus,  the  final  algorithm  for  computing  5y",  y real,  would 

have  a scaling  routine  which  would  (i)  change  the  sign  of  y if  y < 0 and 

also  change  the  sign  (to  negative)  of  the  computed  cube  root  of  -y  prior 

to  returning  a final  approximation,  and  (ii)  scale  y (assume  y > 0) , 

i.e.  compute  y « 2^mx  where  m is  an  integer  and  x €.  (^,  1].  Then,  the  cube 

root  of  x can  be  calculated  by  any  one  of  the  following  routines;  the 

in  " 

result  will  be  multiplied  (shifted)  by  2 and  returned  for  ry  . In  what 

follows,  we  shall  give  the  best  linear  starting  approximation  for  the 

full  interval  [^,  1]  and  partitions  of  this  interval  into  3 and  6 subintervals. 

In  all  cases,  we  shall  also  give  the  (relative)  error  after  one  and  two 

Newton  iterates.  In  addition,  for  the  interval  1]  and  the  partition 

o 

of  this  interval  into  3 subintervals,  we  shall  give  the  best  starting 


approximations  from  0 _<  m,  n,  m + n * k,  k * 1,  2,  3 where  m and  n 

for  fixed  k are  chosen  so  that  the  best  starting  approximation  fron£^, 

(m,  n)  i (m,  n),  0 £ ra,  n,  m + n = m + n does  not  give  a better  relative 
\ ■ 

approximation  for  *x  after  one  newton  iteration.  These  best  starting 
approximations  were  calculated  on  a CDC-6400  where  we  found  the  best 
relative  approximation  to  on  the  interval  in  question  discretized  into 
equally  spaced  mesh  points  with  a step  size  of  h * -777  using  [7].  We 
then  multiplied  this  function  by  the  appropriate  given  by  (3)  using 

the  respective 
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First,  for  1].  For  the  class  H > we  find  that  the  best  starting 
approximat  ion  is  p*(x)  = .60554S1056x  + .4541610792  with  n.  1]  = 3.30112  - 

1/  J o 

and  n^/3  - 1.08496  * 10*5 


In  addition,  the  best  starting  approximation 

from  nx  * 11  is  better  than  the  best  starting  approximation  from 

Next,  for  k = 2,  we  find  that  for  the  three  classes,  T-,  = ^q-8’  ^ ’ 

11>^2^8’  d ^ Che  best  starting  approximation  from  ^ 

yj.  1 

is  the  best  since  the  best  relative  approximation  to  /x  from  x:^[g,  1]  on 

1 -2 
("gt  1]  gives  a relative  error  of  approximately  .650  x 10  ; whereas,  this 

_1  2 i 

corresponding  error  is  .159  * 10  and  .413  x 10  fromX  [—,  1]  and 

U o 

JPjIg*  1]*  respectively.  Using  the  code  [7],  we  find  that  the  best 

r — 1 

relative  approximation  to  7x  on  [— , 1]  is  R(x)  = .1477442S96 


.8414551422 


3 


. 73d7‘*62-*19 


with  relative  error  X 


1/3 


.6500719120  x 10  “ so  that  y1/3  = (1  - /3)“2/ 3 


1.000028174  and  the  best  starting  approximation  is  R*(x)  = 1.4774S4521 
.8414788493  . . rl  ,,  _-5  . 2 ,1 


.7387462419 


- with  n1/3[|,  1]  - 4.226074  x 10~5  and  nJ/3I|,  1]  - 1.78x10 

2 r 1 


For  k * 3 by  the  same  procedure  we  find  that^ig>  1]  Is  the  preferable 
class  from/f3[~,  1]  and  1].  For  this  class,  the 


,1  foe, 07-0 


best  starting  approximation  is  R*(x)  * ,2437995493x  + .8898929185  - 

° .i/ybub^l-o-: 

1 -7  2 1 

witn  1]  = 8.438  x 10  and  n^/glg*  1]  = 0.0  (on  our  SR-56,  should  be 

-14 

of  the  order  10  ) . 

Next,  for  the  (optimal)  partition  [g'll’tg'^'l  U [^."^l  U [“,  1 ] • Here 


we  shall  only  give  the  results  for  [j,  1].  The  best  linear  starting 
approximation  for  calculating  cube  roots  on  [^,  1]  is  p*(x)«.415350l945x 
♦ .5913005214  with  nl/3[y,  1]  - 4.407136  * IO-5  and  nJ/;j[|,  11  - 1-94  x io'9. 
As  before  on  [g-,  1],  this  class  is  preferrable  to  1], 

Once  again,  11  Is  the  preferrable  class  to  use  from  the  set 

1],  11  and  /C°[^-,  11.  The  best  starting  approximation  from 

^l^T*  11  for  calculating  cube  roots  on  1]  is  R*(x)  * 1.790709274 

1.347475985  ,1  , en,  ,„-8  , 2 .1  . . 

1 . 7035364292x  wlth  nl/3l2’  ^ “ 6,503  * 10  and  n 1/ 3^ 2 * ^ “ 0,0 

(on  our  SR-56,  should  be  of  the  order  of  10  ^6). 
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From  the  classes  /Cq[-|,  ll,/j~[y,  1],  *1  and  1],  we  again 

^2  1 

find  that  the  class  will  give  the  most  accurate  (relative)  approxi- 


mation. The  best  starting  approximation  from  1]  is  R*(x) 

\/3 

2 1 -20 
and  ll  °f  the  order  10 


with  nw,ti,  1]  * 1.5  * 10'10 


Finally,  if  one  partitions  [g,  1]  into  6 subintervals  the  optimal 

. rl  .,rl  . 1.5/2,,  ...,1,5/2  l11irl  S2..../2  l,.,rl  /2,  ;r/2 
Partition  is  [ g y - J ~ t g > ( 2^  ]U[(o)  ^ ^ 7 2 

/2 

For  the  interval  , 1]  the  best  linear  starting  approximation  is 

vv  a 

p*(x)  = .3731163725x  + .6235515591  with  n1/3(-y,  1]  = 2.77582  x io 
and  °f  the  order  10 

The  above  variety  of  possibilities  indicates  the  flexibility  of  these 
sort  of  algorithms.  Thus,  in  designing  a specific  subroutine  for  calculating 
a fixed  n^  root  one  can  try  to  minimize  the  effect  of  such  things  as  machines 
that  are  relatively  slow  in  computing  a divide  and  so  on. 
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